# spenserCPU

Spenser Fong, Hassan Farooq, Timothy Vitkin ECE 411 SP22



#### Overview

- Speculative Tomasulo
- Static-not-taken branch predictor
- Direct-mapped Split Caches
- 8-Entry Reorder Buffer
- 6 ALUs + 6 Single-Entry ALU Reservation Stations
- 6 Compare LUs + 6 Single-Entry Compare LU Reservation Stations

## Datapath



#### **Arbiter**



### **Implementation**

#### Design Choices:

Instead of mapping a multi-entry reservation station to a single ALU, we
decided to map a single reservation station entry to a single ALU. This meant
that we did not need to worry about combinational execution time

### Challenges

- 1. We had a hard time going from theory to implementation. Spent a lot of time trying to figure out how the textbook design correlated to actual code.
- Not stalling instruction fetching properly.
- 3. Tag array updated one clock cycle late.
- Lots of edgecases needed tag broadcasted at the same time instruction was added to RS

### Performance – Checkpoint Code

|     | MP2      | MP4     | Speedup |
|-----|----------|---------|---------|
| CP1 | 33.505µs | 7.345µs | 356.16% |
| CP2 | 15.955µs | 6.405µs | 149.10% |

#### Performance – Fmax

|                 | MP2       | MP4       |
|-----------------|-----------|-----------|
| Slow 900mV 100C | 105.15MHz | 104.8MHz  |
| Slow 900mV -40C | 112.08MHz | 111.54MHz |

#### Performance – Power

| MP2                                                                                                                                        |                                                | MP4                                                                                                                                        |                                                 |  |
|--------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|--|
| Total Thermal Power Dissipation Core Dynamic Thermal Power Dissipation Core Static Thermal Power Dissipation I/O Thermal Power Dissipation | 392.53 mW<br>33.18 mW<br>318.97 mW<br>40.39 mW | Total Thermal Power Dissipation Core Dynamic Thermal Power Dissipation Core Static Thermal Power Dissipation I/O Thermal Power Dissipation | 699.94 mW<br>327.22 mW<br>322.39 mW<br>50.32 mW |  |

### What We Wish We Did Differently

- 1. Simpler structs and less signals going between modules
- 2. Smaller Decode
- 3. Simpler ROB
- 4. Used modports
- 5. Create a robust datapath diagram before starting to code

### Questions?